由于几个因素之间的微妙权衡:参与者的隐私,生态有效性,数据保真度和后勤开销,记录野外未脚本人类互动的动态是具有挑战性的。为了解决这些问题,在社区精神上为社区的“数据集”之后,我们提出了会议生活实验室(Conflab):一个新的概念,用于多模式多模式数据收集,野生野外社交对话。对于此处描述的Conflab的首次实例化,我们在一次大型国际会议上组织了现实生活中的专业网络活动。该数据集涉及48个会议参与者,捕捉了地位,熟人和网络动机的各种组合。我们的捕获设置改善了先前野外数据集的数据保真度,同时保留隐私敏感性:从非侵入性的架空视图中获得8个视频(1920x1080,60 fps),并具有定制的可穿戴传感器,并带有车载记录(完整9) - 轴IMU),具有隐私性的低频音频(1250 Hz)和基于蓝牙的接近度。此外,我们开发了用于采集时分布式硬件同步的自定义解决方案,并以高采样速率对身体关键点和动作进行了及时的连续注释。我们的基准测试展示了与野外隐私保护社交数据分析有关的一些开放研究任务:从高架摄像头视图,基于骨架的No-Audio扬声器检测和F-Formation检测中的关键点检测。
translated by 谷歌翻译
地标通常在面部分析中起关键作用,但是仅凭稀疏地标就不能代表身份或表达的许多方面。因此,为了更准确地重建面,地标通常与其他信号(如深度图像或技术)相结合,例如可区分渲染。我们可以通过使用更多地标使事情变得简单吗?在答案中,我们提出了第一种准确地预测10倍地标的方法,覆盖整个头部,包括眼睛和牙齿。这是使用合成培训数据来完成的,该数据保证了完美的地标注释。通过将可变形的模型拟合到这些密集的地标,我们可以在野外实现单眼3D面重建的最新结果。我们表明,密集的地标是通过在单眼和多视图方案中展示准确和表现力的面部绩效捕获来整合跨帧面部形状信息的理想信号。这种方法也非常有效:我们可以预测密集的地标,并在单个CPU线程上以超过150fps的速度适合我们的3D面模型。请参阅我们的网站:https://microsoft.github.io/denselandmarks/。
translated by 谷歌翻译
在社交谈话中的人类行为预测中的默认范式涉及选择利息的特定未来语义事件(例如,演讲者转变变化,群体离开),然后识别他们与低级非语言提示的关系。如此自上而下的方法中的常见障碍是对监督学习的事件标记数据的可用性有限,源于此类事件的不频率。为了解决这一挑战,我们建议将预测投入到一个小说自下而上的自我监督问题中,以利用更大的低级行为线索。我们正规化社会提示预测(SCF)的任务,并表征所涉及的具体建模挑战。为了解决这些社会科学文献的关键观察,并提出社会过程(SP)模型 - 社会意识到序列序列模型,该序列模型将每个对话组视为元学习任务,以解释特定于组的动态。我们的SP模型学习每位参与者未来提示的活动不可知论者,同时捕捉全球不确定性,通过联合推理本集团所有成员的未来。对于SCF的这种新任务,在实际行为数据上提高了非元学习模型的实证性能验证了我们的元学习方法。此外,通过具有类似假设的Meta学习模型的消融和比较验证了我们对此任务的具体建模选择。
translated by 谷歌翻译
The previous fine-grained datasets mainly focus on classification and are often captured in a controlled setup, with the camera focusing on the objects. We introduce the first Fine-Grained Vehicle Detection (FGVD) dataset in the wild, captured from a moving camera mounted on a car. It contains 5502 scene images with 210 unique fine-grained labels of multiple vehicle types organized in a three-level hierarchy. While previous classification datasets also include makes for different kinds of cars, the FGVD dataset introduces new class labels for categorizing two-wheelers, autorickshaws, and trucks. The FGVD dataset is challenging as it has vehicles in complex traffic scenarios with intra-class and inter-class variations in types, scale, pose, occlusion, and lighting conditions. The current object detectors like yolov5 and faster RCNN perform poorly on our dataset due to a lack of hierarchical modeling. Along with providing baseline results for existing object detectors on FGVD Dataset, we also present the results of a combination of an existing detector and the recent Hierarchical Residual Network (HRN) classifier for the FGVD task. Finally, we show that FGVD vehicle images are the most challenging to classify among the fine-grained datasets.
translated by 谷歌翻译
Text-to-text generation models have increasingly become the go-to solution for a wide variety of sequence labeling tasks (e.g., entity extraction and dialog slot filling). While most research has focused on the labeling accuracy, a key aspect -- of vital practical importance -- has slipped through the cracks: understanding model confidence. More specifically, we lack a principled understanding of how to reliably gauge the confidence of a model in its predictions for each labeled span. This paper aims to provide some empirical insights on estimating model confidence for generative sequence labeling. Most notably, we find that simply using the decoder's output probabilities is not the best in realizing well-calibrated confidence estimates. As verified over six public datasets of different tasks, we show that our proposed approach -- which leverages statistics from top-$k$ predictions by a beam search -- significantly reduces calibration errors of the predictions of a generative sequence labeling model.
translated by 谷歌翻译
With the increasing use of Graph Neural Networks (GNNs) in critical real-world applications, several post hoc explanation methods have been proposed to understand their predictions. However, there has been no work in generating explanations on the fly during model training and utilizing them to improve the expressive power of the underlying GNN models. In this work, we introduce a novel explanation-directed neural message passing framework for GNNs, EXPASS (EXplainable message PASSing), which aggregates only embeddings from nodes and edges identified as important by a GNN explanation method. EXPASS can be used with any existing GNN architecture and subgraph-optimizing explainer to learn accurate graph embeddings. We theoretically show that EXPASS alleviates the oversmoothing problem in GNNs by slowing the layer wise loss of Dirichlet energy and that the embedding difference between the vanilla message passing and EXPASS framework can be upper bounded by the difference of their respective model weights. Our empirical results show that graph embeddings learned using EXPASS improve the predictive performance and alleviate the oversmoothing problems of GNNs, opening up new frontiers in graph machine learning to develop explanation-based training frameworks.
translated by 谷歌翻译
Recently, Robey et al. propose a notion of probabilistic robustness, which, at a high-level, requires a classifier to be robust to most but not all perturbations. They show that for certain hypothesis classes where proper learning under worst-case robustness is \textit{not} possible, proper learning under probabilistic robustness \textit{is} possible with sample complexity exponentially smaller than in the worst-case robustness setting. This motivates the question of whether proper learning under probabilistic robustness is always possible. In this paper, we show that this is \textit{not} the case. We exhibit examples of hypothesis classes $\mathcal{H}$ with finite VC dimension that are \textit{not} probabilistically robustly PAC learnable with \textit{any} proper learning rule. However, if we compare the output of the learner to the best hypothesis for a slightly \textit{stronger} level of probabilistic robustness, we show that not only is proper learning \textit{always} possible, but it is possible via empirical risk minimization.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
A classical result in learning theory shows the equivalence of PAC learnability of binary hypothesis classes and the finiteness of VC dimension. Extending this to the multiclass setting was an open problem, which was settled in a recent breakthrough result characterizing multiclass PAC learnability via the DS dimension introduced earlier by Daniely and Shalev-Shwartz. In this work we consider list PAC learning where the goal is to output a list of $k$ predictions. List learning algorithms have been developed in several settings before and indeed, list learning played an important role in the recent characterization of multiclass learnability. In this work we ask: when is it possible to $k$-list learn a hypothesis class? We completely characterize $k$-list learnability in terms of a generalization of DS dimension that we call the $k$-DS dimension. Generalizing the recent characterization of multiclass learnability, we show that a hypothesis class is $k$-list learnable if and only if the $k$-DS dimension is finite.
translated by 谷歌翻译
顺序标记是一项基本的NLP任务,构成了许多应用程序的骨干。对SEQ2SEQ模型的监督学习(如T5)在这些问题上取得了巨大的成功。但是,这些模型的培训目标与我们在实际应用中关心的指标和Desiderata之间存在显着脱节。例如,实用的序列标记应用程序可能需要优化某些Precision-Recall折衷(TOP-K预测),这与最大化金标记序列的可能性的标准目标完全不同。因此,为了弥合这一差距,我们提出了Groot,这是一个简单而有效的框架,用于生成文本序列的奖励优化。 Groot通过训练生成的顺序标记模型来工作,以将解码器输出分布与(Black-Box)奖励函数的输出分布相匹配。使用迭代培训制度,我们首先生成预测候选者,然后纠正其中的错误,最后对比这些候选者(基于其奖励价值)。正如通过四个公共基准测试的广泛实验所证明的那样,Groot显着改善了所有奖励指标。此外,Groot还导致了整体解码器分布的改善,这是由顶级$ K $候选者的质量提高所证明的。
translated by 谷歌翻译